Workshop 2 - Markdown and other topics

What to avoid

  • One way to perform and report an analysis is to do the analysis in R, then copy and paste your results and plots from R into Word, or similar.

  • But this isn’t a very good way of doing things because:

    • it takes a long time
    • there’s lots of ways for errors to creep in
    • if you change something in your analysis you have to go back and re-do the copy-and-pasting (making sure you don’t miss anything)
    • it is not easy for someone else to check and reproduce your analysis.

Why markdown?

  • Using R Markdown helps avoid all of these.
  • With R Markdown, all the analysis writing about it is kept in one place.
  • There’s no copying and pasting involved, and the ‘source’ can be shared with somebody else who can see exactly what you’ve done, check for errors, reproduce your results, and modify and extend the analysis if they wish. These are key principles in reproducible research.

How markdown works

  • The idea of R Markdown is that we write both text and R code into our input (.Rmd) file. We then ‘knit’ the file to produce the output document.

  • The R code is written into ‘code chunks’ like this

```{r, eval=TRUE}
sample.mean <- mean(rnorm(100))
```
  • R code can also be evaluated mid-sentence, e.g. to write things like: “The sum of the first 100 positive integers is `r sum(1:100)`”, which then puts the answer in the right place in the text. To put in an in-line code expression type `r, followed by the code, followed by ` to end it.

Recap of benefits

  • We can use the R code to perform calculations and produce plots, then write text around the output—so reports are very easy to make.
  • There’s no need to save/export/copy-and-paste results, and everything is recalculated if we go back and change something at the beginning.
  • Perfect for auditable analyses.

Rendering a Markdown file

  • Save the files markdownExample.Rmd and Cars.csv in the same folder.

  • The .Rmd file is the input file that will be knitted (meaning “rendered”) to produce the output. Read through and try to guess what each part does.

  • Then click on the button that says Knit.

Further markdown exercises

  1. Change the title, put yourself as the author, and put in today’s date.

  2. Add a new section, just before the Conclusion section, called ‘Bits I am adding’. In this section, add a new figure containing a plot of cars.data$'MPG (city)' versus cars.data$'MPG (highway)'.Give the figure a sensible caption, and write a sentence in the text referring to the figure.

  3. The command table(cars.data$Type) summarises the different types of vehicle in the data set. The command barplot(table(cars.data$Type)) shows the summary as a bar plot. Add this bar plot to the report.

  4. Add a sentence saying “The cheapest car in the data set (in thousands of dollars) costs” then use an in-line code expression with the command min(cars.data$'Minimum price') to finish off the sentence automatically.

Markdown + versioning with git

  • The “source” markdown files are plain text. This makes them well suited to version control, to keep track of edits.

  • Git is very well suited to version control (and quite universally used).

  • Git can store the data too.

  • So everything is in sync, can be “rolled back” to, is easily shared, and is backed up.

Plotting

  • Many different ways to make nice plots:
    • using base R
    • using plotting packages such as ggplot2
    • using “methods” written for specific “object” types (such as model fits)
  • An example of the last case of a particular plotting “method” is:
attach(MASS::hills)
my_fit <- lm(time ~ dist)
plot(my_fit)

Base R plots versus ggplot2 (1)

  • Base R plots involve simple syntax, and produce simple output. E.g.
attach(MASS::hills)
plot(climb, time)
  • Base R plotting is very powerful/versatile.

  • But making more complex plots often requires long (and hard-to-read) code.

Base R plots versus ggplot2 (2)

  • ggplot2 is a tidyverse package designed to work nicely with “tidy” data. Loaded with
library(ggplot2)
  • ggplot has a steep learning curve at first: it takes some getting to grips even with making “easy” plots!

  • But on the other hand, later on it becomes quick to make complex ones.

Here are some examples

MASS::hills |> ggplot(aes(x = climb, y = time))

MASS::hills |> ggplot(aes(x = climb, y = time)) + geom_point()

MASS::hills |> ggplot(aes(x = climb, y = time, label = rownames(MASS::hills))) + geom_point() + geom_text()

library(ggrepel)
MASS::hills |> ggplot(aes(x = climb, y = time, label = rownames(MASS::hills))) + geom_point() + geom_text_repel()

MASS::hills |> ggplot(aes(x = climb, y = time, label = rownames(MASS::hills), colour = grepl('Ben',rownames(MASS::hills)))) + geom_point() + geom_text_repel()